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Abstract — Q Using the idea of interference alignment, Suh and 
Ramchandran constructed a class of minimum-storage regen- 
erating codes which can repair one systematic or one parity- 
check node with optimal repair bandwidth. With the same code 
structure, we show that in addition to single node failure, double 
node failures can be repaired collaboratively with optimal repair 
bandwidth as well. We give a detail description of how to repair 
multiple failures in the Suh-Ramchandran regenerating code with 
six nodes, and sketch the proof for the general case. 

Index Terms — Distributed storage systems, regenerating codes, 
super-regular matrix. 

I. Introduction 

By a distributed storage system, we mean a method of 
encoding and distributing a data file of size B to n storage 
nodes, with the dual purposes that (i) any k nodes are sufficient 
in rebuilding the original file, and (ii) upon the failure of one 
or more storage nodes, we can recover the lost information 
efficiently. Property (i) is called the (n, k) recovery property. 
We say that a coding scheme satisfies the maximal-distance 
separable (MDS) property if the (n, k) recovery property is 
satisfied and each node stores B/k units of data. The MDS 
property can be achieved by conventional MDS codes such as 
the Reed-Solomon (RS) codes. However, the communication 
and traffic required in repairing a failed node is very large if 
RS codes are employed, as the whole file must be downloaded 
before we re-encode the lost data in the failed node. The 
amount of traffic, measured in the number of packets trans- 
mitted from the surviving nodes to the new node, is coined 
repair bandwidth by Dimakis et al. in (TJ. A lower bound 
on repair bandwidth is derived in the same work. A coding 
scheme with repair bandwidth attaining the lower bound is 
called a regenerating code. 

The repair of failed storage nodes can be carried out in 
two ways. In the first one, called exact repair, the contents 
of the new nodes are exactly the same as the content in 
the failed ones. The second mode of repair, called functional 
repair, the content need not be recovered exactly, but the 
(n, k) recovery property is maintained. Exact repair has the 
advantage that we can store the data file in an uncoded form 
in some nodes, called the systematic nodes, while the other 
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nodes store the parity-check data. In case we want to look 
up a small portion of the data file, we can connect to the 
node which holds that particular portion, without downloading 
the whole file. There are several existing constructions of 
regenerating codes for exact repair. One approach is to apply 
idea from interference alignment (2J, Q, which is a concept 
in wireless communication for characterizing the degree of 
freedom of a wireless network. The regenerating code by 
Suh and Ramchandran [4] is one class of regenerating code 
constructed using this technique. 

The Suh-Ramchandran code is designed for repairing single 
failure. For multiple failures, it was shown by Hu et al. in [5] 
that by enabling data exchange, the repair bandwidth per new 
node can be further reduced. The repair process is divided into 
two phases. In the first phase, each newcomer downloads j3\ 
packets from a set of d surviving nodes. In the second phase, 
each pair of newcomers exchange fa packets in both direction. 
It was shown in [6] that for any coding scheme satisfying the 
MDS property, the repair bandwidth per new node is lower 
bounded by 

B(d + r-l) 

k{d + r-k) ' 

where r is the number of failed nodes we repair simulta- 
neously, and d, called the repair degree, is the number of 
surviving nodes contacted by a new node during the repair 
process. When r = 1, the lower bound reduces to that for 
single-node repair in flTJ . A regenerating code which repairs 
multiple-node failure with repair bandwidth per new node 
attaining the bound in (fl} will be referred to as cooperative 
or collaborative regenerating code. Explicit constructions of 
cooperative regenerating codes can be found in flTl- lfPTl . The 
objective of this paper is to show that the structure of the 
Suh-Ramchandran regenerating code also supports multiple- 
node repair. 

After reviewing the Suh-Ramchandran construction in Sec- 
tion mi we state the main result of this paper in Section [ill] 
In Section |IV] an example with (n,k) = (6,3) is given. The 
proof of the main theorem is outlined in Section [V] 

II. The Suh-Ramchandran Construction 

In the Suh-Ramchandran construction the number of nodes, 
n, can be any integer larger than or equal to 2k. For the ease 
of presentation, we focus on the case n = 2k in this paper. 



The extension to n > 2k can be done as in [4|. We will use 
notations different from those in 0, in order to emphasize the 
symmetry of the code, which will be crucial in the derivation 
of multiple-node recovery. 

Let W q denote a finite field of size q. Each data symbol is 
regarded as a finite field element, and we will use a symbol as 
a unit of data. A symbol will also be called a packet. The data 
file is divided into many data chunks, each containing B = k 2 
symbols. All data chunk are encoded and treated in the same 
way. Hence, we only need to describe the operations on one 
data chunk, and without loss of generality, we can assume that 
the data file consists of exactly k 2 symbols. 

The construction requires four non-singular k x k matrices 
U = [mj], V = [vij], P = \pij] and Q = [q i:j ] over F g , 
satisfying 

U = VP and V = UQ. (2) 

Denote the columns of U by Ui, 112, . . . , u^., and the columns 
of V by vi , V2, . . • , Vfe. The columns of U and V are regarded 
as bases of F**, and the matrices P and Q are the change-of- 
basis matrices. The equations in (O are equivalent to 

U, = PliVl + P2jV 2 + P3jV 3 , 

v, = quui + q 2 iU 2 + q3iU3, 

for % = 1,2,3. Let 

U := (U*)- 1 and V := (V*) -1 , (3) 

where the superscript f denotes the transpose operator. 
The columns of U (resp. V) are the dual basis of 
Ux, U2, . . . , Ufc (resp. vi, V2, • • • , Vfe). Let the columns of U 
be Ui, U2, . . . , Ufc, and the columns of V by Vi, V2, ■ • ■ , Vfe. 

Each node stores a column vector of length k over ¥ q . For 
i = 1,2, ... ,k, let the vector stored in node i be denoted by 
Xj, and the vector stored in node k + i be y^. Let X (resp. Y) 
be the k x k matrix whose columns are Xj (resp. yj). 

In the Suh-Ramchandran construction, we can either (i) let 
the data in nodes 1 to k be the uncoded data symbols, and 
generate Y as the parity-check symbols, or (ii) let the data in 
nodes k + 1 to n be the uncoded data symbols, and generate 
X as the parity-check symbols. In the former case, nodes 1 
to k are the systematic nodes, and the information stored in 
them are the entries in matrix X. The parity-check symbols 
in nodes k + 1 to n are obtained by 

Y = <JVX*U + eXP. (4) 

The variable 5 and e are elements in ¥ q to be determined later. 
In the latter case, nodes fc+1 to n are the systematic nodes, and 
the data stored in matrix Y are the uncoded source symbols. 
The symbols in nodes 1 to k are parity-check symbols obtained 
by 

X = J'UY'V + e'YQ, (5) 

Let 

k k 

zj := ^p ej xi and z'j := ^q^Xj. 

1=1 i=i 



For j = 1,2, ... ,k, the data stored in node k + j can be 
expressed as 

k 

y J = ( 5 ViUj-Xi) + ez J- S3) 

i=l 

and the data stored in node j is 

k 

x, = (V]Tu lV <y/) <E) 

i=l 

Theorem 1. Let F(X) = <5VX'U + eXP and G(Y) = 
<5'UY*V + e'YQ be linear transformations from the vector 
space of k x k matrices to itself. If we choose 8, 8', e and e 1 
such that 

55' + ee' = 1 (6) 
e5' + 8e' = 0, (7) 

then the compositions F o G and G o F are the identity 
transformation. 

Proof: For all k x k matrices X, we have 

G(F(X)) = tf'U^l^XV* + eP*X*)V 
+ e'(WX t U + eXP)Q 
= (66' + ee')X + (eS' + 6e')VX t V = X. 

The proof of F(G(Y)) = Y is similar. ■ 
In H, Suh and Ramchandran prove the following 

Theorem 2 ( J4J). The Suh-Ramchandran regenerating codes 
satisfies the MDS property if all square submatrices of matrix 
P are non-singular. 

We will call a matrix super-regular if all square submatrices 
are non-singular. It can be proved that the inverse of a super- 
regular matrix is also super-regular. Therefore in Theorem 
it is equivalent to pick the matrix Q to be super-regular. 

III. Main Result 

The main result of this paper is to show that in the Suh- 
Ramchandran regenerating code, which is originally con- 
structed for repairing single node failure, we can repair some 
other patterns of multiple-node failures optimally. 

Theorem 3. Suppose that in the Suh-Ramchandran construc- 
tion, the parameters V, P, e, 8, e' and 6' are chosen such 
that 

• V is a k x k non-singular matrices over ¥ q , 

• P is a k x k super-regular matrices over ¥ q , 

• e, 8, e' and 8' are non-zero and satisfy (O and ©, 

• PijQji ¥"lforl<i,j< k, 

where qji is the (j, i)-entry o/P^ 1 . Then we can jointly repair 

• r systematic nodes, for any r between 1 and k, 

« r parity-check nodes, for any r between 1 and k, 

• any pair of systematic and parity-check nodes, 

with repair bandwidth attaining the lower bound in ([D and 
repair degree d equal to n minus the number of failed nodes 
repaired cooperatively. 
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We note that it is implied by © and (0 that <5 2 / e 2 and 
(<5') 2 ^ (e') 2 - Indeed, after squaring both sides of © and (|7} 
and subtract, we get (S 2 - e 2 )((6') 2 - (e') 2 ) = 1. Hence, the 
determinant of the 2x2 matrix in 



(8) 



is necessarily non-zero. We can choose 5 and e to be a pair 
of nonzero elements in ¥ q such that S 2 ^ e 2 , and then obtain 
e' and 8 1 by solving ®. The values of e' and <5' so obtained 
are provably non-zero. 

Choosing the entries of P which satisfying the conditions 
in Theorem [3] requires a sufficiently large finite field size. For 
a Cauchy matrix P — [(a^ — the (j, i)-entry of P _1 

can be calculated by 



The encoding is illustrated in the following table: 



<lji 



(ai-bj)-. 



(9) 



UijtMi - at) Uijijih - h) 
See for example lTT2l for a derivation of (|9}. Whence, the 
condition Vijlji 7^ 1 is equivalent to 



nfe 



nfe 

1*3 



Y[(a t -a e )-Y[(b 3 -b £ )^0. 



Let Fij be the left-hand side of the above equation, re- 
garded as a mutli-variate polynomial in a/s and bj's. Con- 
structing a Cauchy matrix P satisfying the conditions in 
Theorem [3] amounts to finding a/s and bj's such that the 
product rii<i j<k ^3 is evaluated to a non-zero constant. By 
Schwartz -Zippel lemma 1 13 Corollary 19.18], this can be done 
provided that finite field size q is sufficiently large. 



Corollary 4. With sufficiently large finite field ¥ q , we can re- 
pair single and double node failures in the Suh-Ramchandran 
regenerating code with optimal repair bandwidth. 

This disprove the assertion in [9] that "it is not possible to 
repair exactly MSCR code with k > 3 and r > 2 in the scalar 
case, such that each node stores a = d — k + r packets." In 
the next section, we give an example of k = 3 and r = 2. 

IV. An Example for n = 6 and k = 3 

Encoding. There are B = 9 symbols to be encoded and 
distributed to the storage nodes. Let us agree that the first three 
nodes are systematic nodes, and the last three nodes are parity- 
check nodes. Each node stores a column vector of length 3. 
We let V = [V1IV2IV3] be a non-singular 3x3 matrices, and 



P11 
P21 

P31 



P12 
P22 

P32 



P13 
P23 

P33 



be a Cauchy matrix, so that the MDS property is guaranteed 
by Theorem |2] Let U = [U1IU2IU3] = VP and denote the 
inverse of P by 



Q 



9n 912 <7i3 

921 922 923 
931 932 933 



Node 


Content 


1 


Xl 


2 


Vn 

-*-2 


3 


X3 


4 


yi = ^E^iVjufxj- + ez x 


5 


y2 = 8 T,j=i Vj u |xj + ez 2 


6 


ys = S J2 3 j= i Vju|xj + ez 3 



Upon the failure of a set of nodes, which contains possibly 
more than one node, each surviving node takes linear combi- 
nations of the stored symbols, and sends the product to each 
of the failed node. If node i is one of the failed node, for 
i = 1,2,3, the surviving nodes send the inner products of the 
stored vector with v, to newcomer i. If node 3 + j is one of 
the failed node, for j = 1,2,3, the surviving nodes send the 
inner product of the stored vector Uj to newcomer 3 +j. 

Repair of one parity-check or one systematic node 

In view of the symmetry between X and Y as in and (O, 
we only need to describe how to repair a parity-check node. 
Without loss of generality, consider the repair of parity-check 
node 4. 

Upon the failure of node 4, nodes 1 to 6 except node 4 
send, respectively, u*xi, 11^X2, u i x 3> u iy2 and u^y3 to 
newcomer 4. In terms of zi, z 2 and z 3 , we can write 



U*iY2 



eu 1 z 2 

A. 



(5u 2 zi 

u*y 3 = 5u 3 zi + euiz 3 . 

Using u*Xi, u*X2 and u|x 3 , newcomer 4 can compute 
u*Zi, u*Z2, u*Z3, provided that e ^ 0. Because 111,112,113 
are linear independent by construction, newcomer 4 can solve 
for the value of zi. Then, the required packets in 

yi = <5(viu*xi + v 2 u*x 2 + v 3 u*x 3 ) + ez 1 

can be recovered exactly. 

Repair of two parity-check or two systematic nodes 

By the symmetry between X and Y, it suffices to consider 
the repair of two parity-check nodes, say nodes 4 and 5. After 
the first phase of the repair process, newcomer 4 receives four 
symbols, u*xi, u*x 2 , u*X3 and 

u*y 3 = (hi 3 zi + eu*z 3 . 

The symbols received by newcomer 5 are u|xi, u|x 2 , U2X3 
and 

u 2 y3 = £u 3 z 2 +eii2Z 3 . 
Recall that newcomer 5 wants to regenerate 



y 2 = ^(viUjXi + V2U2X2 + v 3 u 2 x 3 ) + ez 2 . (10) 

The first term can be obtained from u|xi, u 2 X2 and u 2 X3. 
For the second term, newcomer 5 first calculates 



U* 2 Z 2 = _Pi 2 U 2 X! 



p 22 u 2 x 2 +p 32 u 2 x 3 , 



u 3 z 2 



2 Y3 - ePi3U 2 Xi - ep 23 u 2 X2 - ep 33 u 2 X3 



)■ 



and then asks newcomer 4 for a copy of u^z 2 , which can be 
computed by newcomer 4 by 

U*Z 2 =piiU*Xi +p 2 lU*X 2 +p 3 iU*X 3 . 

In the computation of u 3 z 2 , it is obvious that we need 
to impose the condition that S 7^ 0. Then, by the linear 
independence of m, u 2 and U3, newcomer 5 can regenerating 
the second term in ( fTOb . 

Similarly newcomer 4 can regenerate yi after newcomer 5 
has sent u 2 z x to newcomer 4. 

Repair of three parity-check or three systematic nodes 

Suppose nodes 4, 5 and 6 fail. Newcomer 4 receives u*Xi, 
u*x 2 and u^x 3 , newcomer 5 receives u 2 x x , u 2 x 2 and u 2 x 3 , 
and newcomer 6 receives u 3 xi, u 3 x 2 and u 3 x 3 . 

Consider the repair of node 4. Newcomer 4 first computes 
(5^^ =1 VjU*Xj and u^zi after the first phase of the repair 
process. Next, newcomer 5 and 6 sends u 2 zi and u|zi, 
respectively, to node 4. Then, newcomer 4 can decode Zi 
from u*zi, u 2 zi and u|zi, by the linear independence of 
Ui, u 2 and U3. The content of node 4 is recovered by adding 
8Y%=i VjU*Xj and ez x . 

The repair of nodes 5 and 6 are similar. 

Exact repair of one systematic node and one parity- 
check node 

Without loss of generality, suppose nodes 1 and 5 fail. We 
want to replace them by newcomer 1 and newcomer 5. 

After the first phase of the repair process, newcomer 1 
receives v*x 2 , v*x 3 , 

v 'yi = Su\xi + ev*zi, and 
v*y 3 = <5u 3 Xi + ev*z 3 . 

and newcomer 5 receives u 2 x 2 , u|x 3 , 

u 2 yi — Su\z2 + eu 2 zi, and 
u* 2 y 3 = 5u 3 z 2 + eu* 2 z 3 . 

Newcomer 5 computes a linear combination of the received 
symbols, 

9iiu 2 yi + g 3 iu* 2 y 3 + (6 + e)[p 22 g 2 iu 2 x 2 + p 32 q 2 iu 2 x 3 ], 

The coefficients are chosen so that it can be simplified to 

Sv\z 2 + - + (5)pi 2 q 2 i)u 2 Xi, (11) 

which is a linear combination of v*z 2 and u 2 Xi. (We have 
used the orthogonality relation is equal to the 

Kronecker delta function Sij .) In the second phase of the repair 
process, newcomer 5 sends the symbol in (fTTt to newcomer 1 . 

Since newcomer 1 knows v'x 2 and v^x 3 , newcomer 1 can 
compute 

(<5pi2V* + (e - (e + 5)pi2<?2i)ua)xi 

by subtracting 5p 22 v^x 2 and (5p 32 v*x 3 . Next, newcomer 1 
calculates 

v^yi - ep2iv*x 2 - ep 3 iv*x 3 = (Su\ + epuv*)xi, and 
v*iy3 - ep23V*x 2 - ep 33 v*x 3 = (Su\ + epi 3 v')xi. 



(12) 



The vector xi can be recovered if the matrix 

(e - (e + 6)p 12 q 2 i)u 2 + 8p 12 y\ 

Sul + ep 13 v[ 

is non-singular. 

Using the symmetry of the code, newcomer 5 can recover 
the lost information in a similar way. In the second phase of 
the repair process, newcomer 1 sends 

S'u\z' x + (e' - (e' + S')q 2 i P i 2 )v\y 2 

to newcomer 5. Then, newcomer 5 can decode y 2 if 



(e' - (e' + 5')q 21 p l2 )v\ + 5'q2iu 2 ' 



e'g2 2 u 2 
e'g23U 2 



(13) 



is non-singular. 

In summary, the variables e and 5 should be chosen such 
that © and © are satisfied, and S 7^ 7^ e. The entries of V 
and P should be chosen such that V is non-singular and for 
all permutations (a, b, c) and (x, y, z) of {1, 2, 3}, the matrices 



and 



(e - (e + 8)p ax q xa )u t x + Sp a 

Sul + ePayVl 

(e' - (e' + 5') Wax)v* + 5' q xa u x 
S'vl 



e q x bU x 
e'qxcul 



(14) 



(15) 



are non-singular. 

The next proposition is useful in checking whether these 
two determinants are non-zero. 

Proposition 5. Suppose that V, P, e, S, e' and 5' satisfy the 
criteria in Theorem \3\for k = 3. Then the determinants of the 
matrices in (1141 l and (1151 l are non-zero. 

Proof: Consider the matrix in (TBV We divide the proof 
into two cases. 



Case 1: e — 


(e + < 


reduce the matrix in 




SPax^l, 

















5Pax{qx 



r q ya i 
Sul 



q za u- z 



(16) 



which can further be reduced to a non-singular matrix. 

Case 2: e — (e + S)p ax qxa 7^ 0. After substituting v a by 
9 M Uj; + q ya u y + q za u z , the matrix in ( TT4T > can be factored as 

£ — tPaxqxa SPaxqya Paxqza 
^PayQxa H~ €PayQ_ya ^PxaQza 
epazqxa £Pa Z q V a $ + tPa Z q Z a_\ 

The non-singularity of ( TBI is equivalent to the non- 
singularity of the first factor in ([TBI , which in turn can be 
decomposed as 



\flxa qya Qza] 









u* 




V 







and 


e - (e + 8)p ax q xa 





0" 




Spax 





5 





+ 


e-Pay 










5 




ePaz 



The first summand is non-singular because e — (e + 5)p ax q xa 
and 5 are non-zero. By the Sherman-Morrison formula [14. 
p. 18], we see that the matrix in ( fT4l is invertible if 



i + [q xa 



Qya Qza\ 



(e + 5) Pa 








0" 


-1 


Spax 


6 







tPay 





s_ 




£Paz 


Qya 


+ 


PazQ 


za 



the above expression can be simplified to 

e(e + d))(l -p ax q X a) 2 
e - (e + S)p ax q xa 

which is nonzero because p ax qxa ^ 1. 

The proof that the determinant of the matrix in (fT5t is non- 
zero is similar. ■ 

By Proposition [5] if Pijqji ^ 1 for 1 < i,j < 3, then we 
can jointly repair one systematic and one parity-check node 
in a cooperative manner. 

Numerical Example Let q = 7. We use V = I, the 3 x 3 

identity matrix. The matrices P is a Cauchy matrix and Q is 
the inverse of P, 





"6 1 4" 




"3 2 4" 


p = 


15 2 


, Q = p 1 = 


2 5 1 




4 2 3 




4 1 6 


The (i, j)-entry of P is 


obtained by pij 


= — S 


(ri,r 2 ,r 3 ) = 


(1,3,4) and (si,s 2 ,s 3 ) = 


(2,0,6). \ 



with 



8 = 5' = 3 and e = — e' = 1. The parity-check symbols in Y 
are generated by 

Y = [yi y 2 y 3 ] = (3X f + X) P. 

We check that the conditions in Theorem [3] are satisfied. 

V. Sketch of Proof of Theorem[3] 

Let the B = k 2 entries in X be the source symbols, and 
the entries in Y be the parity-check symbols calculated by (|4j. 
We outline how to exactly repair of one systematic node and 
one parity-check node. Suppose nodes a and k + x fail, where 
a and x are integers between 1 and k. We want to replace 
them by newcomer a and newcomer k + x. Let [k] denote 
{1,2,. ..,k}. 

After the first phase of the repair process, newcomer a 
receives v* x^, for i e [k] \ {a}, and 



for j e [k] \ {x}. Newcomer k 
[k] \ {a}, and 



x receives u" x, 



u xYi =Su t j z x + eu t x z j , 

for j S [k] \ {x}. 

Newcomer k + x computes the linear combination 

X] 'JjaU^y,- + (S + e)^2p lx q xa ulxi 

j^x i=jta 

= 5v f a z x + (e - (e + S)p ax q xa )ul x a , 



where j runs over [k] \ {x} and i runs over [k] \ {a}, and 
sends it to newcomer a in the second phase. Newcomer a 
then calculates 

{Spax^'a + (e - (e + S)p ax q xa u t x )x a , 

and (5u f j + ep a jV* )x a for j 6 [k] \ {x}. Similar to Proposi- 
tion [5] newcomer a can recover x a if p ax q X a ^ 1. 

Using the symmetric of the code, newcomer k + x can 
recover the lost information after receiving 

S ' U< x Z 'a + ( e - ( e + ^feaPazK^yz 

from newcomer a, provided that p ax qxa 7^ L 

VI. Concluding Remarks 

In this paper we show that in the regenerating code 
constructed by Suh and Ramchandran, which is originally 
designed for repairing any single node failure, multiple-node 
failures can also be repaired cooperatively with optimal repair 
bandwidth. In particular, we can repair any set of systematic 
nodes, any set of parity-check nodes, or any pair of nodes. 
However, the technique that we used in this paper cannot be 
extended to the repair of one systematic node and two parity- 
check nodes, because it would require e+5 = 0, which violates 
the conditions in Theorem [3] 
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